Measuring and Detecting Virality on Social Media: The Case of Twitter’s Viral Tweets Topic
Abstract.
Social media posts may go viral and reach large numbers of people within a short period of time. Such posts may threaten the public dialogue if they contain misleading content, making their early detection highly crucial. Previous works proposed their own metrics to annotate if a tweet is viral or not in order to automatically detect them later. However, such metrics may not accurately represent viral tweets or may introduce too many false positives. In this work, we use the ground truth data provided by Twitter’s ”Viral Tweets” topic to review the current metrics and also propose our own metric. We find that a tweet is more likely to be classified as viral by Twitter if the ratio of retweets to its author’s followers exceeds some threshold. We found this threshold to be 2.16 in our experiments. This rule results in less false positives although it favors smaller accounts. We also propose a transformers-based model to early detect viral tweets which reports an F1 score of 0.79. The code and the tweet ids are publicly available at: https://github.com/tugrulz/ViralTweets
1. Introduction

Social media platforms have the power to shape public opinion and spark widespread conversations in a matter of seconds. On those platforms, the posts may go viral, reaching thousands, if not millions, of people within a short period of time even though their authors were not popular initially. Those who seek to maximize influence may craft their campaigns so that they will go viral through such posts. Unfortunately, adversaries may also adopt such an approach and craft viral and misleading content such as fake news and conspiracy theories. Understanding viral posts may help early detection of such content. This is especially crucial for fact-checking: once a claim goes viral, the impact of fact-checking diminishes. On the other hand, it is infeasible for fact-checkers to proactively fact-check every claim. Early detection of viral posts may help them prioritize claims to fact-check (Guo et al., 2022). Moreover, early detection may facilitate analyzing viral posts before they are removed by the adversaries or the platforms (Elmas, 2023).
Past work on predicting viral social media posts mainly focused on Twitter. To build a ground truth set of viral tweets, they used human labeling (Maldonado-Sifuentes et al., 2021) or define their own measures based on public metrics as proxies for virality (Rameez et al., 2022; Maximilian Jenders, 2013). For instance, Jenders et al. (Maximilian Jenders, 2013) consider tweets with a number of retweets more than a threshold as ”viral”. However, such proxies may be too restrictive (i.e. low recall on viral posts) or too lenient (many non-viral posts are labeled as viral) Furthermore, they fell short of accounting for the tweets’ impact on a network, i.e., a tweet that is retweeted a lot may not always reach more users. Furthermore, public metrics may be manipulated (e.g., by bots (Elmas et al., 2021, 2022)) and may result in misclassifying non-viral tweets. A more convincing approach is to predict the viral tweets of a given user, which will control the effect of users’ network and likeliness to use bots.
In 2021, Twitter launched the ”Viral Tweets” topic which discloses the tweets that went viral on the platform (e.g., Fig. 1) and provides reliable ground truth data. In this study, we survey the existing measures and evaluate their ability to capture the viral tweets provided by Twitter. We also propose our own metric based on the retweet and followers of the users and show that it is more precise than the previous methods even though it requires fewer data. We propose a transformers-based method to detect Twitter’s viral tweets without relying on the tweets’ or users’ public metrics to facilitate early detection of viral tweets. Our work will facilitate future research on measuring and detecting virality on social media.
2. Related Works
Previous works define and predict viral tweets based on their definitions. Jenders et al. (Maximilian Jenders, 2013) employ tweet metadata features to predict the likelihood of a tweet becoming ”viral”. Viral tweets are defined as tweets that are retweeted at least T times, with T chosen as 50, 100, 500, or 1000. Maldonado-Sifuentes et al. (Maldonado-Sifuentes et al., 2021) employed RoBERTa for predicting the virality of a tweet using a corpus of 5000 tweets annotated by humans. Zadeh et al. (Zadeh and Sharda, 2022) propose and test a framework based on multivariate Hawkes processes for predicting the popularity (defined as the sum of retweets, replies, and likes) of Twitter posts by brands, using regression rather than classification. Garimella et al. (Garimella and West, 2019) defined the hot streaks (a series of viral tweets in a period) as the tweets which have more retweets than the 90% of the other tweets of the user.
Our work differs from these in two aspects. First, rather than defining viral tweets by ourselves based on proxies which may not be reliable, we tackle the prediction of viral tweets disclosed by Twitter itself. These viral tweets did or have the potential to spread to a wide range of Twitter users. This is because we assume that the platform is able to better define and model viral social media posts due to the internal metrics it has. Secondly, we create and evaluate by a setting that helps check-worthiness estimation in the fact-checking pipeline and aid fact-checkers, which controls the users while predicting the viral tweets.
Other works on viral tweets involve characterizing them. Samuel et al. (Samuel et al., 2020) analyzed a dataset of over one million Tweets to understand the key drivers of successful information exchange and message diffusion on Twitter, focusing on endogenous and exogenous dimensions and providing insights and an early-stage model for explaining tweet performance. Sprejer et al. (Sprejer et al., 2021) investigates the virality of radical right content from 35 radical right influencers. They find that both influencer and content-level factors, including the number of followers, type of content, length, and toxicity of the content, and requests for retweets, are important for engagement with tweets. Hoang et al. (Hoang et al., 2011) study the virality of socio-political tweet content in Singapore’s 2011 general election by collecting tweet data from 20,000 Singaporean users and introducing several quantitative indices to measure the virality of tweets that are retweeted. They identify the most viral messages and the users behind them in the election and explain their behavior. Hasan et al. (Hasan et al., 2022) investigates the effects of virality on users’ subsequent behaviors and long-term visibility on the platform using a dataset of tweeting activities and follower graph changes for 17,157 scientists on Twitter. Gurjar et al (Gurjar et al., 2022) propose a framework to examine changes in user activity and the survival duration of effects associated with popularity shocks. Elmas et al. (Elmas et al., 2020) show that viral social media accounts may be sold and repurposed for malicious purposes later.
3. Data


Topics is a Twitter feature where users can subscribe to a feed of interest such as sports teams, art, food, etc. Twitter curates viral tweets (defined as tweets that are ”Popular now”) under the ”Viral Tweets” topic. Unfortunately, the Twitter API is yet to provide an API endpoint to collect tweets under a topic. Thus, we scraped 1008 tweets from this page between October 2022 and November 2022 and collected their ids. We identified 814 users who authored viral tweets. 89 users posted more than one viral tweet. We collected the last 3200 tweets (which is the limit enforced by the API) of those users to build the dataset of non-viral tweets. We excluded retweets. We collected additional 1,137,050 tweets through this process.
To explore the datasets with respect to public metrics to measure virality, we show the distribution of the retweet counts of the viral tweets and the follower counts of their authors in Fig. 2. We observe that the viral tweets are sourced mostly from unpopular profiles and retweet counts are usually below 10,000.
4. Measuring Virality
Viral tweets are tweets that spread on Twitter and reach a large number of users in a short period. Since external researchers cannot reliably model how many users a tweet reaches, they generally measure virality using proxies, mainly based on the number of retweets. We now survey these measures.
RT ¿ T: The number of retweets should be greater than a predefined hard threshold. The pitfall of this measure is that many users with a high number of followers may acquire retweets higher than such a threshold from their own network, even though they did not go viral. The measure relies only on the retweet count of the tweet as the data. Thus, we propose the following measure that takes the overall performance of the users’ tweets:
RT / Med. RT or RT / Avg. RT: The number of retweets divided by the median or the average retweet count of the users’ tweets should be greater than some threshold. In the former case, tweets with zero retweets are not taken into account. The measure requires the users’ timeline, i.e., recent tweets to compute the median or the average. A similar approach was proposed by Garimella et al. (Garimella and West, 2019):
RT Percentile: The number of retweets should be greater than the kth percentile of the tweets’ retweet counts. The metric assumes that the user will have a fixed number of viral tweets, which may be problematic if the user does not have any viral tweets, or has many viral tweets. This metric also requires users’ timelines.
RT / Followers: Instead of profiling the users using their timeline, we instead use their number of followers. Thus, we normalize the number of retweets the tweet has by dividing the number of followers the user has.
log(RT / Followers): The number of retweets may not increase linearly with the number of followers. To account for this, we compute the natural logarithm of the previous value. We discarded similar measures such as log(RT)/Followers and RT/log(Followers) as they were outperformed by this measure in the experiments.
Influence Score: The metric by Maldonado-Sifuentes et al. (Maldonado-Sifuentes et al., 2021):
(1) |
where is followers count, is the followings count, is the retweets count, is the number of favorites. is a constant, which was set to 10 by the authors. and .
We run these metrics on all the tweets in our dataset to automatically classify if a tweet is viral or not. We define true positives as tweets that are classified as viral by the metric and amplified as viral tweets by Twitter in the Viral Tweets topic. We adopt two approaches to define the false positives set. The first is the smallest set of tweets that the given metric classifies as viral when the metric reaches a 100% true positive rate. In this case, each metric will have a different false negative set. The second is the largest of tweets, which will be all 1,3m non-viral tweets in the dataset. We experiment with different thresholds and percentiles to compute true positive rates and false positive rates that range between 0 and 1.0 by 0.01 steps. Fig. 3 show the ROC curves for all the metrics.

We compute the AUC with the former approach to compute FPR using the standard method as the FPR is already scaled between 0 and 1. For the latter approach, as the FPRs are very low, the resulting AUCs are very high and close to each other. Thus, we constrain the FPR to be between 0 and 0.016 as we observe that the ROC curves are mostly stable by this FPR rate. We then rescale the FPRs to 0 and 1 and compute the AUC, which we name AUC-2. We also compute the number of false positives at TPR = 0.95 for each metric to show how lenient they are, i.e., how many new viral tweets they introduce. Table 1 shows the results. We observe that the influence score does better in AUC, but performs poorly in AUC-2. This is because it is very lenient and introduces too many false positives, 225k when TPR = 0.95. Although a hard threshold (RT ¿ T) is not as lenient, it results in high FPR to achieve a high TPR. Meanwhile, both RT / Followers and log(RT / Followers) achieve well on both metrics, but the latter is less lenient and has a higher harmonic mean of AUC and AUC-2. To put these into practice, the best threshold for log(RT/Followers) is 0.772, i.e., 2160 retweets per 1000 followers. The best hard threshold () is 3088 retweets. The former rule favors smaller accounts while the latter favors popular ones.
Metric | Data Required | AUC | AUC-2 | #Viral |
RT ¿ T (Maximilian Jenders, 2013; Zadeh and Sharda, 2022) | Tweet Only | 0.82 | 0.78 | 12,439 |
RT / Med. RT | Timeline | 0.90 | 0.67 | 85,185 |
RT / Avg. RT | Timeline | 0.93 | 0.75 | 39,923 |
RT Percentile (Garimella and West, 2019) | Timeline | 0.84 | 0.64 | 203,539 |
RT / Followers | Profile | 0.92 | 0.82 | 14,007 |
log(RT / Followers) | Profile | 0.88 | 0.86 | 12,034 |
Influence Score (Maldonado-Sifuentes et al., 2021) | Profile | 0.96 | 0.70 | 225,314 |
5. Detection
5.1. Motivation & Problem
Our goal in predicting viral tweets is to aid fact-checking by proactively detecting tweets that may spread to many social media users in a short period and make an impact. Thus, we do not use any engagement information (e.g., like count) and focus on content features only. We assume that fact-checkers track a set of users that may share misleading content in real-time, detect if they have a claim in their tweets, and estimate the check worthiness of those claims. Since there may be too many tweets to detect check worthiness or too many check-worthy tweets to fact-check, it may be a better strategy to track users that may go viral and prioritize their tweets. Thus, we formulate our problem as follows: ”Given the tweets from a set of users, which tweets are likely to go viral?”.
We set the data according to this problem and constrained it to the viral tweets and their authors’ non-viral tweets on the same day. We only used the tweets in English. This leaves us with 787 viral tweets and 15,904 non-viral tweets. We randomly sample from the non-viral dataset to achieve a balanced dataset. We use a balanced training set of 1,260 tweets and a test set of 314 tweets.
5.2. Feature Engineering
We mainly employ text content as the feature and use transformer-based language models to represent it. We also use additional features that the language models may not model. In our experiment, we observe that such additional features slightly increase our model’s performance. Those features are the boolean features that whether the tweet contains media, hashtags, mentions, has positive sentiment, negative sentiment, and if it is sourced from a verified account. We used distilbert-base-uncased-finetuned-sst-2-english model to compute sentiment, which yields if the tweet is positive and negative with a confidence score. We assign the sentiment of the tweet if the confidence score was higher than 0.7. Table 2 summarizes the features.
Feature | Viral | Non-Viral | Diff |
---|---|---|---|
Contains Media | %62.1 | %21.7 | %40.4 |
Contains Hashtags | %5.85 | %3.03 | %2.82 |
From Verified Account | %5.46 | %7.24 | %1.78 |
Positive Sentiment | %25.2 | %40 | %14.8 |
Negative Sentiment | %74.8 | %60 | %14.8 |
Contains Mentions | %42.76 | %41.12 | %1.64 |
Mean Tweet Length | 88.3 | 64.9 | 23.4 |
5.3. Experimental Results
We used the following transformers-based language models using HuggingFace: BERT-Base (Devlin et al., 2018), RoBERTa (Liu et al., 2019), TinyBERT (Jiao et al., 2019), and BERTweet (Nguyen et al., 2020). We only used the case-sensitive models as we observe that users use upper case when they want to put emphasis on a certain part of their tweets. We experimented with models that rely only on the text content and models that concatenate the features we created to text features. We evaluated the models using Accuracy, Precision, Recall, and F1. Table 3 shows the result. We observe that BERTweet yields the best F1 and using the extra features increases it by 0.027.
Model | Prec | Prec* | Recall | Recall* | F1 | F1* |
---|---|---|---|---|---|---|
BERT-Base | 0.670 | 0.666 | 0.764 | 0.815 | 0.714 | 0.734 |
RoBERTa | 0.704 | 0.681 | 0.834 | 0.860 | 0.764 | 0.761 |
TinyBERT | 0.690 | 0.668 | 0.834 | 0.885 | 0.755 | 0.762 |
BERTweet | 0.717 | 0.740 | 0.822 | 0.854 | 0.766 | 0.793 |
6. Conclusion and Future Work
This study improves virality understanding on social media by testing existing metrics on reliable data, proposing a new metric, and predicting viral tweets. Our analysis predates Twitter’s view count disclosure policy. However, our virality metrics may benefit the works on other social media platforms which do not disclose such data. Additionally, our work on predicting viral tweets using language models may inspire future work which automatically generates content that is likely to go viral, which may help experts and fact-checkers to create content that better resonates with the public. Furthermore, viral tweets tend to have more media content, and further research could focus on modeling this.
Ethical Disclosure: We only used the data from public profiles amplified by Twitter. We only disclose the tweet ids from the data.
References
- (1)
- Devlin et al. (2018) Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
- Elmas (2023) Tuğrulcan Elmas. 2023. The Impact of Data Persistence Bias on Social Media Studies. arXiv preprint arXiv:2303.00902 (2023).
- Elmas et al. (2022) Tuğrulcan Elmas, Rebekah Overdorf, and Karl Aberer. 2022. Characterizing Retweet Bots: The Case of Black Market Accounts. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 16. 171–182.
- Elmas et al. (2020) Tuğrulcan Elmas, Rebekah Overdorf, Ömer Faruk Akgül, and Karl Aberer. 2020. Misleading repurposing on twitter. arXiv preprint arXiv:2010.10600 (2020).
- Elmas et al. (2021) Tuğrulcan Elmas, Rebekah Overdorf, Ahmed Furkan Özkalay, and Karl Aberer. 2021. Ephemeral astroturfing attacks: The case of fake twitter trends. In 2021 IEEE European Symposium on Security and Privacy (EuroS&P). IEEE, 403–422.
- Garimella and West (2019) Kiran Garimella and Robert West. 2019. Hot streaks on social media. In Proceedings of the international AAAI conference on web and social media, Vol. 13. 170–180.
- Guo et al. (2022) Zhijiang Guo, Michael Schlichtkrull, and Andreas Vlachos. 2022. A survey on automated fact-checking. Transactions of the Association for Computational Linguistics 10 (2022), 178–206.
- Gurjar et al. (2022) Omkar Gurjar, Tanmay Bansal, Hitkul Jangra, Hemank Lamba, and Ponnurangam Kumaraguru. 2022. Effect of Popularity Shocks on User Behaviour. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 16. 253–263.
- Hasan et al. (2022) Rakibul Hasan, Cristobal Cheyre, Yong-Yeol Ahn, Roberto Hoyle, and Apu Kapadia. 2022. The Impact of Viral Posts on Visibility and Behavior of Professionals: A Longitudinal Study of Scientists on Twitter. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 16. 323–334.
- Hoang et al. (2011) Tuan-Anh Hoang, Ee-Peng Lim, Palakorn Achananuparp, Jing Jiang, and Feida Zhu. 2011. On modeling virality of twitter content. In International conference on Asian digital libraries. Springer, 212–221.
- Jiao et al. (2019) Xiaoqi Jiao, Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Linlin Li, Fang Wang, and Qun Liu. 2019. Tinybert: Distilling bert for natural language understanding. arXiv preprint arXiv:1909.10351 (2019).
- Liu et al. (2019) Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019).
- Maldonado-Sifuentes et al. (2021) Christian E Maldonado-Sifuentes, Jason Angel, Grigori Sidorov, Olga Kolesnikova, and Alexander Gelbukh. 2021. Virality Prediction for News Tweets Using RoBERTa. In Mexican International Conference on Artificial Intelligence. Springer, 81–95.
- Maximilian Jenders (2013) Felix Naumann Maximilian Jenders, Gjergji Kasneci. 2013. Analyzing and predicting viral tweets. (2013).
- Nguyen et al. (2020) Dat Quoc Nguyen, Thanh Vu, and Anh Tuan Nguyen. 2020. BERTweet: A pre-trained language model for English Tweets. arXiv preprint arXiv:2005.10200 (2020).
- Rameez et al. (2022) Rikaz Rameez, Hossein A Rahmani, and Emine Yilmaz. 2022. ViralBERT: A User Focused BERT-Based Approach to Virality Prediction. (2022).
- Samuel et al. (2020) Jim Samuel, Myles Garvey, and Rajiv Kashyap. 2020. That message went viral?! exploratory analytics and sentiment analysis into the propagation of tweets. arXiv preprint arXiv:2004.09718 (2020).
- Sprejer et al. (2021) Laila Sprejer, Helen Margetts, Kleber Oliveira, David O’Sullivan, and Bertie Vidgen. 2021. An influencer-based approach to understanding radical right viral tweets. arXiv preprint arXiv:2109.07588 (2021).
- Zadeh and Sharda (2022) Amir Zadeh and Ramesh Sharda. 2022. How Can Our Tweets Go Viral? Point-Process Modelling of Brand Content. Information & Management 59, 2 (2022), 103594.